{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 10.1.3 XGBoost算法的简单代码实现"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "XGBoost模型既可以做分类分析，也可以做回归分析，分别对应的模型为XGBoost分类模型（XGBClassifier）及XGBoost回归模型（XGBRegressor）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "XGBoost模型的安装办法可以采用PIP安装法，以Windows操作系统为例，Win+R快捷键调出运行框，输入cmd后，在弹出界面中输入代码后Enter键回车运行即可：\n",
    "pip install xgboost"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 如果是在Jupyter Notebook编辑器中，则可输入如下内容，然后运行该代码块即可（需取消注释）：\n",
    "# !pip install xgboost"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.分类模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# XGBoost分类模型的引入方式：\n",
    "from xgboost import XGBClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 在Jupyter Notebook编辑器中，在引入该库后，可以通过如下代码获取官方讲解内容（需取消注释）：\n",
    "# XGBClassifier?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-21T01:55:56.596188Z",
     "start_time": "2020-11-21T01:55:56.584219Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-21T01:55:59.687435Z",
     "start_time": "2020-11-21T01:55:59.680462Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1,  2],\n",
       "       [ 3,  4],\n",
       "       [ 5,  6],\n",
       "       [ 7,  8],\n",
       "       [ 9, 10]])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-21T01:55:53.365841Z",
     "start_time": "2020-11-21T01:55:45.279622Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0]\n"
     ]
    }
   ],
   "source": [
    "# XGBoost分类模型简单代码演示如下所示：\n",
    "from xgboost import XGBClassifier\n",
    "import numpy as np\n",
    "\n",
    "X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])  # 2020年升级后必须是numpy或者DataFrame格式\n",
    "y = [0, 0, 0, 1, 1]\n",
    "\n",
    "model = XGBClassifier()\n",
    "model.fit(X, y)\n",
    "\n",
    "print(model.predict(np.array([[5, 5]])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "其中X是特征变量，其共有2个特征；y是目标变量；第4行代码使用array数组类型的数据做演示，因为XGBoost分类模型的特征变量不支持直接输入list列表类型的数据，可以传入array数组格式的数据或者DataFrame二维表格格式的数据；第7行引入模型；第8行通过fit()函数训练模型；最后1行通过predict()函数进行预测。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.回归模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# XGBoost回归模型的引入方式：\n",
    "from xgboost import XGBRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 在Jupyter Notebook编辑器中，在引入该库后，可以通过如下代码获取官方讲解内容（需取消注释）：\n",
    "# XGBRegressor?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-11-11T09:22:00.852126Z",
     "start_time": "2020-11-11T09:22:00.828192Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3.0000014]\n"
     ]
    }
   ],
   "source": [
    "# XGBoost回归模型简单代码演示如下所示：\n",
    "from xgboost import XGBRegressor\n",
    "import numpy as np\n",
    "\n",
    "X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])\n",
    "y = [1, 2, 3, 4, 5]\n",
    "\n",
    "model = XGBRegressor()\n",
    "model.fit(X, y)\n",
    "\n",
    "print(model.predict(np.array([[5, 5]])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "其中X是特征变量，其共有2个特征；y是目标变量；第5行引入模型；第6行通过fit()函数训练模型；最后1行通过predict()函数进行预测。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
